Skip to main content

Architecture Design

Overview

Our data processing and management system integrates several components to effectively handle and transform data from raw inputs to actionable business insights. This paper outlines the architecture and functionality of each key component within the system.

Data Warehouse

Function: Central repository for storing all system data.

Details: All incoming data is systematically stored in tables within the data warehouse, providing a structured and organized storage solution.

Download and Decode Data

Component: Spark Jobs

Function: Download and decode data.

Process:

  • Download: Spark jobs download raw data from an RPC service.
  • Save: The downloaded raw data is saved to the data warehouse.
  • Decode: Raw data is decoded into source data (calls and events) using the ABI protocol, which is stored in the CMS database.

Admin CMS

Function: Management of ABIs and decoded data.

Details: The CMS manages the Application Binary Interfaces (ABIs) of protocols necessary for data decoding. ABI data is stored in the CMS database to ensure accurate and efficient decoding processes.

Data Transformer

Component: Transformer (DBT Project)

Function: Transform source data into business tables.

Role: Similar to the Spellbook on Dune, the Transformer processes and refines source data tables into business-ready tables, enabling meaningful analysis and reporting.

Workflow Orchestration

Component: Airflow

Function: Administration and scheduling.

Role: Airflow handles the scheduling and coordination of tasks across the system, ensuring that data processing workflows are executed efficiently and reliably.

Query Engine

Primary Engine: Trino Query Engine

Function: Execute SQL queries from the BI tool.

Details:

  • Infrastructure: Runs on Amazon EMR (Elastic MapReduce) to leverage AWS services for detailed data processing.
  • Operation: Receives SQL queries from the BI tool (Metabase), reads data from the data warehouse, performs calculations, and summarizes results to be returned to the BI tool.

Business Intelligence Tool

Component: Metabase

Function: User interface for data analysis and visualization.

Capabilities:

  • Querying: Allows users to write SQL queries.
  • Visualization: Users can create charts and build dashboards.
  • Data Storage: The BI application's data, including user-created charts and dashboards, is stored in a PostgreSQL database.